Conversation
WalkthroughSplit the original RAG guide into a multi-page RAG section (Ingestion, Retrieval, Service, Schedule Automation), updated navigation and homepage links, revised ingestion flow (new "Choose ingestion mode"), added support details for multiple file types, new media, and created API/service documentation. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client (UI/API)
participant Service as RAG Service
participant Embed as Embedding API
participant Vector as Vector DB
participant Store as Object Storage
rect rgba(200,230,255,0.5)
Client->>Service: POST /upload (file, metadata)
Service->>Store: Persist original file (optional)
Service->>Embed: Request embeddings for chunks
Embed-->>Service: Return embedding vectors
Service->>Vector: Upsert vectors into index
Vector-->>Service: Ack upsert
Service-->>Client: Upload response (status, ids)
end
rect rgba(200,255,200,0.5)
Client->>Service: POST /retrieve (query, params)
Service->>Embed: Embed query text
Embed-->>Service: Return query vector
Service->>Vector: Search (top N, threshold)
Vector-->>Service: Retrieved chunks
Service->>Service: (optional) Rerank / filter results
Service-->>Client: Retrieved results (documents, metadata)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (1)
en/docs/ai/schedule-rag-automation.md (1)
29-29: Amazon S3 is listed as a supported datasource but has no corresponding setup instructions.Step 2 only covers Google Drive. Users who select S3 will have no guidance.
Either add S3 configuration steps, or soften the claim to reflect that only Google Drive is currently documented:
✏️ Proposed interim fix
-The datasource specifies the location from which files will be ingested. Devant supports both Google Drive folders and Amazon S3 buckets as datasources. +The datasource specifies the location from which files will be ingested. Devant supports Google Drive folders and Amazon S3 buckets as datasources. The steps below walk through Google Drive; refer to the platform documentation for S3 configuration.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@en/docs/ai/schedule-rag-automation.md` at line 29, The docs claim that Devant supports Google Drive folders and Amazon S3 buckets but Step 2 only documents Google Drive; either add missing S3 setup instructions (how to create an S3 bucket, IAM policy/credentials, environment variables or config keys the app expects, and any required path/key format) and link them from the "datasource" paragraph, or change the "datasource specifies..." sentence to only mention Google Drive (e.g., "currently supports Google Drive folders") and note S3 is planned; update the "Step 2" heading to reflect the chosen change so S3 users are not left without guidance.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@en/docs/ai/rag-ingestion.md`:
- Line 13: Fix the capitalization and align the supported file-type list with
rag-service.md: change "PDFS" to "PDFs" and expand the list in the sentence that
starts with "Devant RAG ingestion has support for multiple file types" to
include PDF (including scanned), DOCX, PPTX, XLSX, HTML, CSV, MD, images, and
audio formats MP3, WAV, OGG, M4A, FLAC so the documentation in rag-ingestion.md
matches rag-service.md.
In `@en/docs/ai/rag-retrieval.md`:
- Line 50: Change the lowercase phrase "rag retrieval" to capitalized "RAG
retrieval" in the sentence "After completing the RAG ingestion process, you can
also implement a rag retrieval to connect your vector database with user queries
and generate responses using the WSO2 Integrator: BI." so the term matches the
acronym formatting; update the instance in the file content where "rag
retrieval" appears.
- Around line 45-46: Replace the incorrect admonition token "!!!info" with the
correct Markdown admonition syntax "!!! info" in the document (look for the
string "!!!info" in the content snippet) so the block renders properly; update
any identical occurrences in this file (and optionally mirror the same fix in
rag-service.md) to maintain consistent admonition formatting.
- Around line 6-9: Fix the prose and formatting in the RAG retrieval doc: change
the note marker from "!!!note" to "!!! note", correct "in to" to "into", change
"havent" to "haven't", and remove the duplicated phrase so "navigate to your
organization using the **Organization** dropdown in the in the top left of the
Devant console header" becomes "navigate to your organization using the
**Organization** dropdown in the top left of the Devant console header"; update
the note block and the navigation sentence accordingly in the rag-retrieval.md
content.
In `@en/docs/ai/rag-service.md`:
- Line 153: Fix the typo in the sentence containing the link text "[RAG
Ingesion]" by changing "RAG Ingesion" to "RAG Ingestion" so the sentence reads
"For more details on RAG ingestion and retrieval, and how to obtain API keys and
credentials refer the [RAG Ingestion](rag-ingestion.md) guide." Ensure only the
displayed link text is corrected without altering the target URL
"rag-ingestion.md".
- Around line 32-33: The admonition uses incorrect syntax "!!!note" which won't
be recognized; update the string in en/docs/ai/rag-service.md to include the
required space so it reads "!!! note" (match the correct usage already present
elsewhere in the file) to ensure Python-Markdown's admonition extension renders
the note block properly.
- Line 26: Fix the spelling mistake in the documentation sentence that currently
reads "On the devolopment environment card…" by changing "devolopment" to
"development" so the sentence becomes "On the development environment card,
click **Test** to open the OpenAPI Console…"; update this text in the
en/docs/ai/rag-service.md content where that sentence appears.
In `@en/docs/ai/schedule-rag-automation.md`:
- Line 36: Update the sentence about creating an API key to clarify that
standard Google Cloud API keys cannot access private Drive folders and to
specify what credential format the product's "API Key" field accepts (e.g.,
service account JSON, OAuth 2.0 client/refresh token, or domain-wide delegated
service account) and then replace or augment the existing link with direct links
to the appropriate workflows for creating service account credentials and OAuth
2.0 credentials (Google Drive service account JSON setup and OAuth
consent/refresh token guides) so users follow the correct credential creation
path for private Drive access.
---
Nitpick comments:
In `@en/docs/ai/schedule-rag-automation.md`:
- Line 29: The docs claim that Devant supports Google Drive folders and Amazon
S3 buckets but Step 2 only documents Google Drive; either add missing S3 setup
instructions (how to create an S3 bucket, IAM policy/credentials, environment
variables or config keys the app expects, and any required path/key format) and
link them from the "datasource" paragraph, or change the "datasource
specifies..." sentence to only mention Google Drive (e.g., "currently supports
Google Drive folders") and note S3 is planned; update the "Step 2" heading to
reflect the chosen change so S3 users are not left without guidance.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
en/docs/ai/rag-service.md (1)
27-30:Try it outshould be bolded to match the UI-element formatting convention used elsewhere.Every other interactive UI label in this list (
**Test**,**Execute**) is bolded;Try it outon Line 28 is the odd one out.✏️ Proposed fix
-3. Click Try it out to enable it. +3. Click **Try it out** to enable it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@en/docs/ai/rag-service.md` around lines 27 - 30, Make the UI label "Try it out" bold to match the existing convention used for other interactive labels; locate the step text containing the plain "Try it out" in the steps list (the same block that contains "**Test**" and "**Execute**") and change it so the label is wrapped in markdown bold markers (i.e., make it "**Try it out**") to keep formatting consistent with the other UI elements.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@en/docs/ai/rag-retrieval.md`:
- Line 19: The sentence "Enter the **Collection Name**. The collection will be
automatically created if it does not exist." in step 3 is misleading for
retrieval-only flows; remove the second sentence and replace it with a short
note that the named collection must already exist and contain indexed documents
(e.g., "Enter the **Collection Name**. Note: for retrieval, the collection must
already exist and be populated with indexed documents; an empty or non-existent
collection will return no results."). Update the text in the
en/docs/ai/rag-retrieval.md step 3 accordingly.
In `@en/docs/ai/schedule-rag-automation.md`:
- Around line 29-43: The docs currently say both Google Drive and Amazon S3 are
supported but only show Google Drive steps; add a parallel "Amazon S3"
subsection under the datasource selection that lists the required fields (Bucket
name/URL, Region, Access Key ID, Secret Access Key, optional Endpoint for
S3-compatible providers, and any IAM role or public/private access notes),
provide brief guidance on where to find the bucket name and region, and either
include example steps for entering these values in the UI or add a cross-link to
existing S3 configuration documentation if one exists (mirror the structure used
in the Google Drive steps and include an info block for important notes such as
bucket permissions).
---
Duplicate comments:
In `@en/docs/ai/rag-retrieval.md`:
- Around line 45-46: The admonition is written as '!!!info' which won’t render;
change the token to include a space — use '!!! info' (and keep the same
indentation level) so Python-Markdown's admonition extension recognizes it and
renders the reranking note as a styled info block.
- Around line 6-7: Fix the grammar in the note sentence inside the RAG retrieval
guide: change the line that currently reads "If you haven't already follow the
[Ingestion](rag-ingestion.md) guide on how you can do that." to a corrected,
concise phrasing such as "If you haven't already, follow the
[Ingestion](rag-ingestion.md) guide to ingest files into your vector store."
Ensure the comma after "already" is added and the trailing informal clause is
replaced with the clearer instruction; update the note block where this sentence
appears.
In `@en/docs/ai/rag-service.md`:
- Around line 152-153: Fix the ungrammatical sentence in the doc: replace the
line that currently reads "For more details on RAG ingestion and retrieval, and
how to obtain API keys and credentials refer the [RAG
Ingestion](rag-ingestion.md) guide." with a corrected version that inserts the
missing comma and corrects "refer the" to "refer to the", e.g. "For more details
on RAG ingestion and retrieval, and how to obtain API keys and credentials,
refer to the [RAG Ingestion](rag-ingestion.md) guide." Ensure you update the
exact sentence text in the en/docs/ai/rag-service.md content.
In `@en/docs/ai/schedule-rag-automation.md`:
- Around line 35-38: Update the guidance in the "schedule-rag-automation.md" doc
so the public-folder limitation is explicit at the top of Step 2 rather than
only inside the collapsible info block: move or duplicate the sentence "The
target folder must be public ('Anyone with the link'), as API keys cannot access
private files." to the start of Step 2 and add a short sentence naming supported
alternatives (OAuth 2.0 or service account credentials) for users who need
private-folder access or note that private access is not supported if that is
intentional; modify the existing info block (the "Info" block shown) to
reference the top-level note and/or provide the OAuth/service-account options.
---
Nitpick comments:
In `@en/docs/ai/rag-service.md`:
- Around line 27-30: Make the UI label "Try it out" bold to match the existing
convention used for other interactive labels; locate the step text containing
the plain "Try it out" in the steps list (the same block that contains
"**Test**" and "**Execute**") and change it so the label is wrapped in markdown
bold markers (i.e., make it "**Try it out**") to keep formatting consistent with
the other UI elements.
| The datasource specifies the location from which files will be ingested. Devant supports both Google Drive folders and Amazon S3 buckets as datasources. | ||
|
|
||
| 1. Select `Google Drive` as the datasource. | ||
|
|
||
| 2. Enter the **API Key** in the API key field. | ||
|
|
||
| ???+ info "Info" | ||
| To obtain a key, use the [Google Cloud Console](https://console.cloud.google.com/) to create a project and generate a key as described in the [Google Documentation](https://cloud.google.com/docs/authentication/api-keys#create) and restrict it to the **Google Drive API**. | ||
|
|
||
| **Note:** The target folder must be **public** ("Anyone with the link"), as API keys cannot access private files. | ||
|
|
||
| 3. Provide the **Folder ID** of the Google Drive folder containing the files to be ingested. | ||
|
|
||
| ???+ info "Info" | ||
| The Google Drive folder ID can be found in the URL when viewing the folder in Google Drive. It is the string that appears after `/folders/` in the URL. |
There was a problem hiding this comment.
S3 datasource is mentioned as supported but its configuration steps are not documented.
Line 29 explicitly states both Google Drive and Amazon S3 are supported, yet the guide only walks through Google Drive. Users choosing Amazon S3 as the datasource have no instructions for providing the bucket URL, region, access key, or secret key.
Either document the S3 configuration in a sub-section (parallel to the Google Drive sub-section), or note that S3 documentation is covered elsewhere and link to it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@en/docs/ai/schedule-rag-automation.md` around lines 29 - 43, The docs
currently say both Google Drive and Amazon S3 are supported but only show Google
Drive steps; add a parallel "Amazon S3" subsection under the datasource
selection that lists the required fields (Bucket name/URL, Region, Access Key
ID, Secret Access Key, optional Endpoint for S3-compatible providers, and any
IAM role or public/private access notes), provide brief guidance on where to
find the bucket name and region, and either include example steps for entering
these values in the UI or add a cross-link to existing S3 configuration
documentation if one exists (mirror the structure used in the Google Drive steps
and include an info block for important notes such as bucket permissions).
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@en/docs/ai/rag-retrieval.md`:
- Line 19: Fix the typo in the sentence "Enter the **Collection Name** from
which you want to retrive data." by changing "retrive" to "retrieve" so the line
reads "Enter the **Collection Name** from which you want to retrieve data.";
update that exact string in en/docs/ai/rag-retrieval.md.
- Line 24: Replace the incorrect brand spacing in the sentence "Select
`text-embedding-ada-002` embedding model from the **Open AI** dropdown." by
changing "**Open AI**" to "**OpenAI**" so the brand name is a single word
(consistent with the other occurrence on line 28); update the phrase in the same
string or markdown element where that sentence appears (search for
"text-embedding-ada-002" to locate it).
---
Duplicate comments:
In `@en/docs/ai/rag-retrieval.md`:
- Line 7: Update the sentence "This guide assumes you have already ingested
files into your vector store. If you haven't already follow the
[Ingestion](rag-ingestion.md) guide on how you can do that." by inserting a
comma after "already" so it reads "If you haven't already, follow the
[Ingestion](rag-ingestion.md) guide..." — locate the sentence in
en/docs/ai/rag-retrieval.md and apply the punctuation fix to the second
sentence.
|  | ||
|
|
||
| ### Step 4: Choose ingestion mode | ||
|
|
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@en/docs/ai/rag-retrieval.md`:
- Around line 45-46: The admonition block starting with '???+ info "Info"' has
its list item line indented with 5 spaces, which can render as a code block;
change the indentation of the line beginning with '-' to exactly 4 spaces so the
list item is parsed correctly (locate the '???+ info "Info"' block and the
subsequent line that currently starts with five spaces and reduce it to four).
- Line 52: The external link labeled "RAG retrieval"
(https://bi.docs.wso2.com/integration-guides/ai/rag/build-a-rag-application/#rag-retrieval)
in en/docs/ai/rag-retrieval.md is returning 403 and must be removed or replaced;
update the sentence that references the RAG retrieval tutorial to either (a)
point to a working public URL, (b) embed the essential implementation
steps/configuration directly into this doc under a new "RAG retrieval"
subsection, or (c) remove the cross-reference entirely and adjust surrounding
text accordingly so there are no broken links or references to the inaccessible
WSO2 Integrator: BI documentation.
In `@en/docs/ai/rag-service.md`:
- Line 53: The docs use two different names for the same parameter — "Max
segment size" in the /upload section and "Max chunk size" in the /chunks
section; standardize on a single term (use "Max chunk size") by replacing all
occurrences of "Max segment size" in the /upload endpoint text with "Max chunk
size" and ensure any related descriptions, examples, and parameter headings in
both the /upload and /chunks sections reference the same exact phrase so the
parameter name is consistent across the documentation.
- Around line 18-28: Fix minor prose issues in the Create/Test service steps:
change "Click the **Create Service** button and your service will get created "
to "Click the **Create Service** button and your service will be created."
(remove trailing space and add period), add a comma in "Once the component is
created you will be redirected to the Overview page." so it reads "Once the
component is created, you will be redirected to the Overview page.", and
bold-format the UI element "Try it out" so it appears as **Try it out** (to
match the already bolded **Execute**).
In `@en/docs/ai/schedule-rag-automation.md`:
- Around line 10-11: Update the prerequisite reference in
schedule-rag-automation.md so it correctly matches the current rag-ingestion.md:
either change the text that currently says "steps 1 to 4" to "steps 1 to 6" or
remove the specific step count and refer generically to "the RAG ingestion
guide" (e.g., "This guide assumes you have completed the steps in the RAG
ingestion guide.") to avoid mismatches; locate and edit the line that mentions
rag-ingestion.md to apply the chosen wording.
---
Duplicate comments:
In `@en/docs/ai/rag-retrieval.md`:
- Line 7: The sentence in the documentation line "If you haven't already follow
the [Ingestion](rag-ingestion.md) guide on how you can do that." is missing a
comma; update the sentence in en/docs/ai/rag-retrieval.md to read "If you
haven't already, follow the [Ingestion](rag-ingestion.md) guide on how you can
do that." so it is grammatically correct.
In `@en/docs/ai/rag-service.md`:
- Line 151: Update the sentence fragment "For more details on RAG ingestion and
retrieval, and how to obtain API keys and credentials refer the [RAG
Ingestion](rag-ingestion.md) guide." to read "For more details on RAG ingestion
and retrieval, and how to obtain API keys and credentials, refer to the [RAG
Ingestion](rag-ingestion.md) guide." — locate the exact string in rag-service.md
and insert the comma and the word "to" as shown.
In `@en/docs/ai/schedule-rag-automation.md`:
- Around line 29-43: Add a parallel Amazon S3 subsection under the "datasource"
section to document S3-specific fields and steps: specify required inputs
(Bucket name, Region, Access Key ID, Secret Access Key, optional Session Token,
and optional Prefix/path), describe how to find the bucket name and region,
explain required IAM permissions or public access settings, and include an
example or note about setting object ACLs or making objects accessible;
alternatively add a clear cross-link from the Google Drive steps to a new or
existing "S3 datasource configuration" doc and ensure the terms "Amazon S3",
"S3", and "datasource" are mentioned so users selecting S3 can find the
configuration guidance.
| ???+ info "Info" | ||
| - Devant's retrieval process uses a reranking model to ensure that only the most accurate and contextually relevant chunks are returned. |
There was a problem hiding this comment.
???+ block content indented with 5 spaces instead of 4 — potential rendering issue.
Python-Markdown's admonition extension expects exactly 4 spaces of indentation for block content. The current 5-space indent ( -) could cause the line to be parsed as a code block rather than a list item within the collapsible note.
✏️ Proposed fix
???+ info "Info"
- - Devant's retrieval process uses a reranking model to ensure that only the most accurate and contextually relevant chunks are returned.
+ - Devant's retrieval process uses a reranking model to ensure that only the most accurate and contextually relevant chunks are returned.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@en/docs/ai/rag-retrieval.md` around lines 45 - 46, The admonition block
starting with '???+ info "Info"' has its list item line indented with 5 spaces,
which can render as a code block; change the indentation of the line beginning
with '-' to exactly 4 spaces so the list item is parsed correctly (locate the
'???+ info "Info"' block and the subsequent line that currently starts with five
spaces and reduce it to four).
|
|
||
| After completing the RAG ingestion process, you can also implement a RAG retrieval to connect your vector database with user queries and generate responses using the WSO2 Integrator: BI. | ||
|
|
||
| For detailed implementation steps and configuration, refer to the [RAG retrieval](https://bi.docs.wso2.com/integration-guides/ai/rag/build-a-rag-application/#rag-retrieval) tutorial in the WSO2 Integrator: BI documentation. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the HTTP status of the WSO2 BI RAG tutorial link
curl -o /dev/null -s -w "%{http_code}\n" \
"https://bi.docs.wso2.com/integration-guides/ai/rag/build-a-rag-application/#rag-retrieval"Repository: wso2/docs-devant
Length of output: 63
Remove or replace the inaccessible external link. The referenced WSO2 Integrator: BI documentation link returns a 403 Forbidden status, making it inaccessible to users. Either provide a working link, embed the relevant information directly in the documentation, or remove the reference if the external resource is no longer available.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@en/docs/ai/rag-retrieval.md` at line 52, The external link labeled "RAG
retrieval"
(https://bi.docs.wso2.com/integration-guides/ai/rag/build-a-rag-application/#rag-retrieval)
in en/docs/ai/rag-retrieval.md is returning 403 and must be removed or replaced;
update the sentence that references the RAG retrieval tutorial to either (a)
point to a working public URL, (b) embed the essential implementation
steps/configuration directly into this doc under a new "RAG retrieval"
subsection, or (c) remove the cross-reference entirely and adjust surrounding
text accordingly so there are no broken links or references to the inaccessible
WSO2 Integrator: BI documentation.
| Click the **Create Service** button and your service will get created | ||
|
|
||
| !!! note | ||
| - When the service is created, Devant automatically increases the container resources (CPU and memory) to ensure reliable operation. | ||
|
|
||
| ### Step 2: Test endpoints | ||
| Once the component is created you will be redirected to the Overview page. | ||
|
|
||
| 1. On the development environment card, click **Test** to open the OpenAPI Console, where you will be able to try out all the available endpoints from the endpoint list. | ||
| 2. Expand the resource you want to test. | ||
| 3. Click Try it out to enable it. |
There was a problem hiding this comment.
Several minor prose issues in the Create/Test service steps.
- Line 18: missing period and trailing space; "will get created" → "will be created."
- Line 24: missing comma — "Once the component is created**,** you will be redirected…"
- Line 28:
Try it outis a UI element and should be bold-formatted, consistent with**Execute**on line 30.
✏️ Proposed fix
-Click the **Create Service** button and your service will get created
+Click the **Create Service** button and your service will be created.-Once the component is created you will be redirected to the Overview page.
+Once the component is created, you will be redirected to the Overview page.-3. Click Try it out to enable it.
+3. Click **Try it out** to enable it.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Click the **Create Service** button and your service will get created | |
| !!! note | |
| - When the service is created, Devant automatically increases the container resources (CPU and memory) to ensure reliable operation. | |
| ### Step 2: Test endpoints | |
| Once the component is created you will be redirected to the Overview page. | |
| 1. On the development environment card, click **Test** to open the OpenAPI Console, where you will be able to try out all the available endpoints from the endpoint list. | |
| 2. Expand the resource you want to test. | |
| 3. Click Try it out to enable it. | |
| Click the **Create Service** button and your service will be created. | |
| !!! note | |
| - When the service is created, Devant automatically increases the container resources (CPU and memory) to ensure reliable operation. | |
| ### Step 2: Test endpoints | |
| Once the component is created, you will be redirected to the Overview page. | |
| 1. On the development environment card, click **Test** to open the OpenAPI Console, where you will be able to try out all the available endpoints from the endpoint list. | |
| 2. Expand the resource you want to test. | |
| 3. Click **Try it out** to enable it. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@en/docs/ai/rag-service.md` around lines 18 - 28, Fix minor prose issues in
the Create/Test service steps: change "Click the **Create Service** button and
your service will get created " to "Click the **Create Service** button and your
service will be created." (remove trailing space and add period), add a comma in
"Once the component is created you will be redirected to the Overview page." so
it reads "Once the component is created, you will be redirected to the Overview
page.", and bold-format the UI element "Try it out" so it appears as **Try it
out** (to match the already bolded **Execute**).
| - Collection name: where data will be stored. | ||
| - Embedding model provider and model (e.g., OpenAI, Azure, Mistral) and API key. | ||
| - Chunking strategy: `recursive`, `sentence`, or `character`. | ||
| - Max segment size and max overlap size for chunking. |
There was a problem hiding this comment.
Terminology inconsistency between /upload and /chunks: "Max segment size" vs "Max chunk size".
Line 53 (/upload) calls the parameter "Max segment size", while line 118 (/chunks) calls the same parameter "Max chunk size". Using different names for the same field across two closely related endpoints will confuse users who reference both sections while implementing API calls.
Pick one term (e.g., "Max chunk size") and use it consistently across both endpoints.
✏️ Proposed fix
-- Max segment size and max overlap size for chunking.
+- Max chunk size and max overlap size for chunking.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - Max segment size and max overlap size for chunking. | |
| - Max chunk size and max overlap size for chunking. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@en/docs/ai/rag-service.md` at line 53, The docs use two different names for
the same parameter — "Max segment size" in the /upload section and "Max chunk
size" in the /chunks section; standardize on a single term (use "Max chunk
size") by replacing all occurrences of "Max segment size" in the /upload
endpoint text with "Max chunk size" and ensure any related descriptions,
examples, and parameter headings in both the /upload and /chunks sections
reference the same exact phrase so the parameter name is consistent across the
documentation.
| - Schedule RAG ingestion option is available only for paid Devant users. | ||
| - This guide assumes you have completed steps 1 to 4 in the [RAG ingestion](rag-ingestion.md) guide. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find and display all step headings in rag-ingestion.md
fd -e md -g "rag-ingestion.md" --exec cat -n {} | grep -E "^[^:]*:\s*#{2,4} (Step|step)"Repository: wso2/docs-devant
Length of output: 42
🏁 Script executed:
#!/bin/bash
# Find rag-ingestion.md in the repository
git ls-files | grep -i "rag-ingestion"Repository: wso2/docs-devant
Length of output: 258
🏁 Script executed:
#!/bin/bash
# Read rag-ingestion.md and find all step headings
cat -n en/docs/ai/rag-ingestion.md | grep -E "#{2,4} Step"Repository: wso2/docs-devant
Length of output: 295
🏁 Script executed:
#!/bin/bash
# View the exact content around line 11 in schedule-rag-automation.md to confirm the reference
head -15 en/docs/ai/schedule-rag-automation.md | tail -8Repository: wso2/docs-devant
Length of output: 369
Update the prerequisite step reference to match the current RAG ingestion guide.
Line 11 references "steps 1 to 4" in the RAG ingestion guide, but rag-ingestion.md now contains 6 steps (Step 1–6: Initialize vector store, Configure embedding model, Configure chunking, Choose ingestion mode, Upload source files, and Verify). Either update the reference to "steps 1 to 6" or remove the specific step count if only a subset is actually required as a prerequisite.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@en/docs/ai/schedule-rag-automation.md` around lines 10 - 11, Update the
prerequisite reference in schedule-rag-automation.md so it correctly matches the
current rag-ingestion.md: either change the text that currently says "steps 1 to
4" to "steps 1 to 6" or remove the specific step count and refer generically to
"the RAG ingestion guide" (e.g., "This guide assumes you have completed the
steps in the RAG ingestion guide.") to avoid mismatches; locate and edit the
line that mentions rag-ingestion.md to apply the chosen wording.
Description
This PR introduces several improvements and updates to the RAG (Retrieval-Augmented Generation) documentation, including:
Documentation Restructuring and Navigation Updates:
rag-application.mdinto four focused guides:rag-ingestion.md,rag-retrieval.md,schedule-rag-automation.md, andrag-service.md, each covering a specific aspect of RAG workflows.mkdocs.ymlto group RAG documentation under a new "RAG" section, with clear links to Ingestion, Retrieval, Schedule Automation, and Service guides.RAG Ingestion Guide Improvements:
New Guides and Features:
Resolves issue #2201, #2060
Summary by CodeRabbit